An Automaton for Motifs Recognition in DNA Sequences
نویسندگان
چکیده
In this paper we present a new algorithm to find inexact motifs (which are transformed into a set of exact subsequences) from a DNA sequence. Our algorithm builds an automaton that searches for the set of exact subsequences in the DNA database (that can be very long). It starts with a preprocessing phase in which it builds the finite automaton, in this phase it also considers the case in which two di erent subsequences share a substring (in other words, the subsequences might overlap), this is implemented in a similar way as the KMP algorithm. During the searching phase, the algorithm recognizes all instances in the set of input subsequences that appear in the DNA sequence. The automaton is able to perform the search phase in linear time with respect to the dimension of the input sequence. Experimental results show that the proposed algorithm performs better than the Aho-Corasick algorithm, which has been proved to perform better than the naive approach, even more; it is considered to run in linear time.
منابع مشابه
A Small Automaton for Word Recognition in DNA Sequences
A method for pattern analysis of DNA sequence data is considered. A space economical automaton for word recognition was presented elsewhere together with an algorithm for its compilation in linear time. An algorithm for the localization of words including imperfect matches (motif search) was developed. A program was implemented on the Macintosh and used extensively for the representation of the...
متن کاملStructured Motifs Recognition in DNA sequences
In this paper is presented a methodology for structured motifs recognition (SMR) in DNA sequences. The SMR problem consists of finding all instances of a triple-pattern PL − PC − PR in a DNA sequence, where PL, PC and PR are based on the IUPAC alphabet, and PL and PR are both separated from PC by a distance no greater than ”n” characters, which is provided as input. In this problem an inexact a...
متن کاملFunctional motifs in Escherichia coli NC101
Escherichia coli (E. coli) bacteria can damage DNA of the gut lining cells and may encourage the development of colon cancer according to recent reports. Genetic switches are specific sequence motifs and many of them are drug targets. It is interesting to know motifs and their location in sequences. At the present study, Gibbs sampler algorithm was used in order to predict and find functional m...
متن کاملDevelopment of an Efficient Hybrid Method for Motif Discovery in DNA Sequences
This work presents a hybrid method for motif discovery in DNA sequences. The proposed method called SPSO-Lk, borrows the concept of Chebyshev polynomials and uses the stochastic local search to improve the performance of the basic PSO algorithm as a motif finder. The Chebyshev polynomial concept encourages us to use a linear combination of previously discovered velocities beyond that proposed b...
متن کاملStructured Motifs Identification in DNA Sequences
In this paper, we present an algorithm that finds structured motifs in a DNA sequence. A structured motif consists of a central motif and one or two satellite motifs, which may be located to the left and / or right of the central motif. The search of the motifs is performed in two stages: first, the central motifs are located through an exact set matching process, which is implemented by a dete...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009